Silent speech recognition from articulatory movements using deep neural network
نویسندگان
چکیده
Laryngectomee patients lose their ability to produce speech sounds and suffer in their daily communication. There are currently limited communication options for these patients. Silent speech interfaces (SSIs), which recognize speech from articulatory information (i.e., without using audio information), have potential to assist the oral communication of persons with laryngectomy or other speech or voice disorders. One of the challenging problems in SSI development is to accurately recognize speech from articulatory data. Deep neural network (DNN)-hidden Markov model (HMM) has recently been successfully used in (acoustic) speech recognition, which shows significant improvements over the long-standing approach Gaussian mixture model (GMM)-HMM. DNNHMM, however, has rarely been used in silent speech recognition. This paper investigated the use of DNNHMM in recognizing speech from articulatory movement data. The articulatory data in the MOCHA-TIMIT data set was used in the experiment. Results indicated the performance improvement of DNN-HMM over GMMHMM in silent speech recognition.
منابع مشابه
Multiview Representation Learning via Deep CCA for Silent Speech Recognition
Silent speech recognition (SSR) converts non-audio information such as articulatory (tongue and lip) movements to text. Articulatory movements generally have less information than acoustic features for speech recognition, and therefore, the performance of SSR may be limited. Multiview representation learning, which can learn better representations by analyzing multiple information sources simul...
متن کاملEvaluation of a Silent Speech Interface Based on Magnetic Sensing and Deep Learning for a Phonetically Rich Vocabulary
To help people who have lost their voice following total laryngectomy, we present a speech restoration system that produces audible speech from articulator movement. The speech articulators are monitored by sensing changes in magnetic field caused by movements of small magnets attached to the lips and tongue. Then, articulator movement is mapped to a sequence of speech parameter vectors using a...
متن کاملSpeech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملArticulatory movement prediction using deep bidirectional long short-term memory based recurrent neural networks and word/phone embeddings
Automatic prediction of articulatory movements from speech or text can be beneficial for many applications such as speech recognition and synthesis. A recent approach has reported stateof-the-art performance in speech-to-articulatory prediction using feed forward neural networks. In this paper, we investigate the feasibility of using bidirectional long short-term memory based recurrent neural n...
متن کاملSpeaker-independent silent speech recognition with across-speaker articulatory normalization and speaker adaptive training
Silent speech recognition (SSR) converts non-audio information (e.g., articulatory information) to speech. SSR has potential to enable laryngectomees to produce synthesized speech with a natural sounding voice. Despite its recent advances, current SSR research has largely relied on speaker-dependent recognition. High degree of variation in articulatory patterns across different talkers has been...
متن کامل